Data Challenge¶

The Idea¶

One of my hobbies is baking. I love to bake cakes, cookies, tarts, and all those delicious treats. This hobby sparked my idea for this challenge. We all know about the apple, butter, carrot, or milk in the back of the fridge that is about to spoil. You can still eat it, but it doesn't look appetizing, or it's too much to use in a short amount of time. For this, I thought it would be a good challenge to create an app that could identify the ingredients you upload with a picture and use that list of ingredients to find a recipe so you don't have to throw those ingredients away.

Approach¶

My approach to this challenge is to start with a CNN model that can identify the ingredients in a picture that you upload. From this model, an ingredient list will be made. This list will then be used in an NLP model to find a recipe that you could make to use those ingredients. For this, I spent some time finding images I can use to create this CNN model. I found a dataset that seems to be good enough for now at Roboflow.

After having found the data I started with loading the images and research how to best train a model to detect the ingredients in the dataset.

Loading the packages¶

Starting of with loading the packages that i am going to use in the notebook.

In [2]:
import os
import cv2
import numpy as np
import matplotlib.pyplot as plt
import random
import tensorflow as tf
from collections import Counter

from tensorflow.keras import models, layers
from tensorflow.keras.applications import MobileNetV2
from tensorflow.keras.utils import Sequence
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, GlobalAveragePooling2D, Dense, Reshape

Loading the data¶

Here the data will be loaded. This data is already labeled using a bounding box method.

In [3]:
dataset_root = "FridgeDetection_data"
splits = ["train", "test", "valid"]

def load_data(split):
    image_dir = os.path.join(dataset_root, split, "images")
    label_dir = os.path.join(dataset_root, split, "labelTxt")
    
    images, labels = [], []
    for file in os.listdir(image_dir):
        if file.lower().endswith((".jpg", ".png")):
            img_path = os.path.join(image_dir, file)
            label_path = os.path.join(label_dir, os.path.splitext(file)[0] + ".txt")
            
            images.append(cv2.imread(img_path))
            labels.append(open(label_path).read() if os.path.exists(label_path) else None)
    
    return images, labels

data = {split: dict(zip(["images", "labels"], load_data(split))) for split in splits}

# Print summary
for split in splits:
    print(f"{split.capitalize()}: Loaded {len(data[split]['images'])} images and {len(data[split]['labels'])} labels.")
Train: Loaded 1521 images and 1521 labels.
Test: Loaded 73 images and 73 labels.
Valid: Loaded 145 images and 145 labels.

The dataset consists of 1521 images for training, 73 images for validation, and 145 images for testing.

Data Understanding¶

The first thing to look at is how the images look when loaded with the labels.

In [4]:
def plot_images_with_labels(images, labels, title, num_images=10, max_columns=2):
    combined = list(zip(images, labels))
    random.shuffle(combined)
    images, labels = zip(*combined[:num_images])

    num_rows = (len(images) + max_columns - 1) // max_columns
    fig, axes = plt.subplots(num_rows, max_columns, figsize=(10, 5 * num_rows))
    axes = axes.flatten() if num_images > 1 else [axes]

    for ax, img, label_text in zip(axes, images, labels):
        img = img.copy()
        if label_text:
            for line in label_text.strip().splitlines():
                parts = line.split()
                if len(parts) >= 9:
                    coords = list(map(int, parts[:8]))
                    label = parts[8]
                    pts = np.array(coords).reshape((4, 2))
                    cv2.polylines(img, [pts], isClosed=True, color=(0, 0, 255), thickness=2)
                    cv2.putText(img, label, tuple(pts[0]), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 255), 1)

        ax.imshow(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))
        ax.axis("off")

    # Hide unused subplots
    for ax in axes[len(images):]:
        ax.axis("off")

    fig.suptitle(title, fontsize=16)
    plt.subplots_adjust(top=0.92, hspace=0.3)
    plt.show()

# Example usage
plot_images_with_labels(data['train']['images'], data['train']['labels'], "Train Set with Labels")
plot_images_with_labels(data['valid']['images'], data['valid']['labels'], "Valid Set with Labels")
plot_images_with_labels(data['test']['images'], data['test']['labels'], "Test Set with Labels")
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image

As you can see above, the images have one or more labels that are annotated using bounding boxes. Additionally, there are labels that are not useful for the tool I aim to create. These include labels like bags, baskets, trays, and so on. In the next step, I will analyze the data to identify more of these types of labels.

In [5]:
# Function to count labels
def count_labels(data):
    label_counts = Counter()
    for split in data:
        for label_data in data[split]['labels']:
            if label_data:
                for line in label_data.splitlines():
                    parts = line.split()
                    if len(parts) >= 9:
                        label = parts[8]
                        label_counts[label] += 1
    return label_counts

# Count labels across all splits
label_counts = count_labels(data)

# Print the counts
for label, count in label_counts.items():
    print(f"{label}: {count}")
crackers: 189
sausages: 359
sprite: 186
chocolate_drink: 363
coke: 218
orange: 222
apple: 169
paprika: 238
noodles: 191
cereal: 231
grape_juice: 254
basket: 92
orange_juice: 210
scrubby: 238
sponge_opl: 200
cloth_opl: 92
potato_chips: 191
pringles: 213
potato: 251
onion: 344
garlic: 224
butter: 152
eggs: 347
tomato: 199
lemon: 206
tray: 98
help_me_carry_opl: 37

So the data does not have as much variance in the ingredients as I hoped. But for at least the first prototype, I think it will have to do. Other than that, there are labels like:

  • basket
  • scrubby
  • sponge_opl
  • cloth_opl
  • tray
  • help_me_carry_opl

These labels do not contribute to the goal of this tool. So I will be looking into removing them from the dataset.

Next, we will be looking into the shape of the images.

In [6]:
# Check the size of the first image in the training set
image_shape = data['train']['images'][0].shape
print(f"Image size: {image_shape}")
Image size: (640, 640, 3)

As you can see, the shape of the images is (640, 640, 3). When making the first model, I ran into the issue that the images were too large for the resources of my computer. Therefore, I will resize them to (224, 224, 3) in the data preparation step.

Prepropecessing¶

I started off by making my job a bit easier and naming the different sets as X_train, y_train, and so on.

In [7]:
X_train = data['train']['images']
X_val = data['valid']['images']
y_train = data['train']['labels']
y_val = data['valid']['labels']
X_test = data['test']['images']
y_test = data['test']['labels']

Then I went on to resize the images so that my computer could actually train a model on them.

In [8]:
def resize(img):
    # Convert to RGB
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    # Normalize pixel values to [0, 1]
    img = img.astype("float32") / 255.0
    # Clip values to ensure they are within the valid range [0, 1] for floats
    img = np.clip(img, 0.0, 1.0)
    # Resize image
    img = cv2.resize(img, (224, 224), interpolation=cv2.INTER_CUBIC)
    return img

# Apply preprocessing to all sets
X_train = np.array([resize(img) for img in X_train])
X_val = np.array([resize(img) for img in X_val])
X_test = np.array([resize(img) for img in X_test])
In [9]:
image_shape = X_train[0].shape
print(f"Image size: {image_shape}")
Image size: (224, 224, 3)

Now that the images are resized, I also have to resize the labels to ensure that they are still placed in the accurate spots.

In [10]:
def resize_labels(labels, original_size, new_size):
    resized_labels = []
    scale_x = new_size[0] / original_size[0]
    scale_y = new_size[1] / original_size[1]
    
    for label_data in labels:
        if label_data:
            resized_label_data = []
            for line in label_data.splitlines():
                parts = line.split()
                if len(parts) >= 9:
                    x1, y1, x2, y2, x3, y3, x4, y4, label = parts[:9]
                    coords = [int(int(p) * scale_x if i % 2 == 0 else int(p) * scale_y) for i, p in enumerate([x1, y1, x2, y2, x3, y3, x4, y4])]
                    resized_label_data.append(f"{coords[0]} {coords[1]} {coords[2]} {coords[3]} {coords[4]} {coords[5]} {coords[6]} {coords[7]} {label}")
            resized_labels.append("\n".join(resized_label_data))
        else:
            resized_labels.append(None)
    
    return resized_labels# Example usage

original_size = (640, 640)  # Assuming original images are 640x640
new_size = (224, 224)  # Resized images are 224x224

# Resize labels for train, validation, and test sets
resized_train_labels = resize_labels(y_train, original_size, new_size)
resized_val_labels = resize_labels(y_val, original_size, new_size)
resized_test_labels = resize_labels(y_test, original_size, new_size)

# Print a few examples to verify
print("Original label:", y_train[0])
print("Resized label:", resized_train_labels[0])
Original label: 121 107 522 107 522 418 121 418 crackers 0
Resized label: 42 37 182 37 182 146 42 146 crackers

Now that I have resized the labels, I am hoping they were resized correctly. We will know more if we plot the images with the labels again.

In [11]:
def plot_images_with_labels(images, labels, title, num_images=10, max_columns=2):
    # Shuffle images and labels together
    combined = list(zip(images, labels))
    random.shuffle(combined)
    images, labels = zip(*combined)
    
    num_rows = (min(num_images, len(images)) + max_columns - 1) // max_columns
    plt.figure(figsize=(10, 5 * num_rows))
    
    for i in range(min(num_images, len(images))):
        img = images[i].copy()  # Make a copy of the image to avoid modifying the original image
        if labels[i]:
            for line in labels[i].splitlines():
                parts = line.split()
                if len(parts) >= 9:
                    # Extract the coordinates and label
                    x1, y1, x2, y2, x3, y3, x4, y4, label = parts[:9]
                    
                    # Convert the coordinates to integers
                    x1, y1, x2, y2, x3, y3, x4, y4 = map(int, [x1, y1, x2, y2, x3, y3, x4, y4])
                    
                    # Draw the label text
                    cv2.putText(img, label, (x1, y1 - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 255), 1)
                    
                    # Draw the bounding box polygon
                    cv2.polylines(img, [np.array([[x1, y1], [x2, y2], [x3, y3], [x4, y4]], np.int32)], 
                                   isClosed=True, color=(0, 0, 255), thickness=2)
                
        
        # Plot the image
        plt.subplot(num_rows, max_columns, i + 1)
        plt.imshow(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))  # Convert from BGR to RGB for displaying
        plt.axis("off")
    
    # Set title and show the plot
    plt.suptitle(title)
    plt.show()

# Plot 10 random images from the train set with labels
plot_images_with_labels(X_train, resized_train_labels, "Train Set with Labels")

# Plot 10 random images from the validation set with labels
plot_images_with_labels(X_val, resized_val_labels, "Valid Set with Labels")

y_train = resized_train_labels
y_val = resized_val_labels
y_test = resized_test_labels
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Got range [-0.095000304..255.0].
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Got range [-0.06674641..255.0].
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Got range [-0.06547103..255.0].
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Got range [-0.013377762..255.0].
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Got range [-0.027686544..255.0].
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Got range [-0.02136603..255.0].
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Got range [-0.051106583..255.0].
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Got range [-0.14747715..255.0].
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Got range [-0.090747744..255.0].
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Got range [-0.14418912..255.0].
No description has been provided for this image
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Got range [0.0..255.0].
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Got range [-0.018304618..255.0].
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Got range [-0.031982306..255.0].
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Got range [-0.05855351..255.0].
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Got range [-0.0024695303..255.0].
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Got range [-0.022475451..255.0].
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Got range [-0.028537638..255.0].
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Got range [-0.011511661..255.0].
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Got range [0.03954988..1.0197519].
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Got range [-0.015605217..255.0].
No description has been provided for this image

The labels and images seem to be resized correctly. However, I have a feeling that the colors might have been slightly altered during the preprocessing step. I have, however, decided to leave the non-ingredient labels and this color problem for later so I can test a few initial CNN models.

Next, I will be encoding the labels into a multi-hot format for modeling.

In [12]:
# Preprocess labels into multi-hot encoded format
def encode_labels(labels, label_counts):
    label_to_index = {label: idx for idx, label in enumerate(label_counts.keys())}
    num_classes = len(label_counts)
    
    encoded_labels = []
    for label_data in labels:
        multi_hot = np.zeros(num_classes, dtype=np.float32)
        if label_data:
            for line in label_data.splitlines():
                parts = line.split()
                if len(parts) >= 9:
                    label = parts[8]
                    if label in label_to_index:
                        multi_hot[label_to_index[label]] = 1.0
        encoded_labels.append(multi_hot)
    return np.array(encoded_labels)

# Apply preprocessing to train, validation, and test labels
y_train_encoded = encode_labels(y_train, label_counts)
y_val_encoded = encode_labels(y_val, label_counts)
y_test_encoded = encode_labels(y_test, label_counts)

Moving on to the modeling section.

Modeling¶

I started with creating my own model just to see what would happen. I expected no decent results since classifying multiple categories in one image would require a more complex approach than what I implemented here.

In [13]:
# Define the number of classes based on the label_counts
num_classes = len(label_counts)

model = models.Sequential([
    layers.Input(shape=(224, 224, 3)),
    layers.Conv2D(32, 3, activation='relu'),
    layers.MaxPooling2D(),
    layers.Conv2D(64, 3, activation='relu'),
    layers.MaxPooling2D(),
    layers.Conv2D(128, 3, activation='relu'),
    layers.MaxPooling2D(),
    layers.Flatten(),
    layers.Dense(128, activation='relu'),
    layers.Dense(num_classes, activation='sigmoid')  # sigmoid for multi-label
])

model.compile(optimizer='adam',
              loss='binary_crossentropy',
              metrics=['accuracy'])

# Fit the model with preprocessed labels
history = model.fit(X_train, y_train_encoded, epochs=50, batch_size=32, validation_data=(X_val, y_val_encoded))
Epoch 1/50
48/48 [==============================] - 7s 68ms/step - loss: 0.3585 - accuracy: 0.0559 - val_loss: 0.2943 - val_accuracy: 0.0621
Epoch 2/50
48/48 [==============================] - 3s 52ms/step - loss: 0.2970 - accuracy: 0.1249 - val_loss: 0.2784 - val_accuracy: 0.0759
Epoch 3/50
48/48 [==============================] - 2s 52ms/step - loss: 0.2773 - accuracy: 0.1492 - val_loss: 0.2507 - val_accuracy: 0.1724
Epoch 4/50
48/48 [==============================] - 2s 51ms/step - loss: 0.2307 - accuracy: 0.2373 - val_loss: 0.2497 - val_accuracy: 0.1724
Epoch 5/50
48/48 [==============================] - 3s 52ms/step - loss: 0.1734 - accuracy: 0.3761 - val_loss: 0.2598 - val_accuracy: 0.2345
Epoch 6/50
48/48 [==============================] - 3s 52ms/step - loss: 0.1057 - accuracy: 0.5444 - val_loss: 0.2935 - val_accuracy: 0.1931
Epoch 7/50
48/48 [==============================] - 3s 53ms/step - loss: 0.0491 - accuracy: 0.6542 - val_loss: 0.3573 - val_accuracy: 0.2621
Epoch 8/50
48/48 [==============================] - 3s 53ms/step - loss: 0.0228 - accuracy: 0.6870 - val_loss: 0.4496 - val_accuracy: 0.2414
Epoch 9/50
48/48 [==============================] - 3s 53ms/step - loss: 0.0112 - accuracy: 0.6897 - val_loss: 0.4796 - val_accuracy: 0.2345
Epoch 10/50
48/48 [==============================] - 3s 53ms/step - loss: 0.0048 - accuracy: 0.6844 - val_loss: 0.5677 - val_accuracy: 0.2000
Epoch 11/50
48/48 [==============================] - 3s 54ms/step - loss: 0.0027 - accuracy: 0.7022 - val_loss: 0.5576 - val_accuracy: 0.2276
Epoch 12/50
48/48 [==============================] - 3s 53ms/step - loss: 0.0023 - accuracy: 0.7009 - val_loss: 0.5724 - val_accuracy: 0.2207
Epoch 13/50
48/48 [==============================] - 3s 53ms/step - loss: 0.0020 - accuracy: 0.6969 - val_loss: 0.6151 - val_accuracy: 0.2069
Epoch 14/50
48/48 [==============================] - 3s 54ms/step - loss: 0.0016 - accuracy: 0.6989 - val_loss: 0.5812 - val_accuracy: 0.2483
Epoch 15/50
48/48 [==============================] - 3s 54ms/step - loss: 7.3650e-04 - accuracy: 0.6838 - val_loss: 0.6380 - val_accuracy: 0.2414
Epoch 16/50
48/48 [==============================] - 3s 54ms/step - loss: 2.2392e-04 - accuracy: 0.7035 - val_loss: 0.6762 - val_accuracy: 0.2483
Epoch 17/50
48/48 [==============================] - 3s 54ms/step - loss: 9.4416e-05 - accuracy: 0.7035 - val_loss: 0.7048 - val_accuracy: 0.2345
Epoch 18/50
48/48 [==============================] - 3s 54ms/step - loss: 6.6793e-05 - accuracy: 0.7041 - val_loss: 0.7257 - val_accuracy: 0.2276
Epoch 19/50
48/48 [==============================] - 3s 54ms/step - loss: 5.4202e-05 - accuracy: 0.7022 - val_loss: 0.7404 - val_accuracy: 0.2276
Epoch 20/50
48/48 [==============================] - 3s 54ms/step - loss: 4.5623e-05 - accuracy: 0.7035 - val_loss: 0.7542 - val_accuracy: 0.2276
Epoch 21/50
48/48 [==============================] - 3s 54ms/step - loss: 3.9355e-05 - accuracy: 0.7035 - val_loss: 0.7656 - val_accuracy: 0.2276
Epoch 22/50
48/48 [==============================] - 3s 54ms/step - loss: 3.4536e-05 - accuracy: 0.7041 - val_loss: 0.7768 - val_accuracy: 0.2276
Epoch 23/50
48/48 [==============================] - 3s 54ms/step - loss: 3.0583e-05 - accuracy: 0.7048 - val_loss: 0.7860 - val_accuracy: 0.2345
Epoch 24/50
48/48 [==============================] - 3s 54ms/step - loss: 2.7284e-05 - accuracy: 0.7061 - val_loss: 0.7948 - val_accuracy: 0.2276
Epoch 25/50
48/48 [==============================] - 3s 55ms/step - loss: 2.4540e-05 - accuracy: 0.7048 - val_loss: 0.8039 - val_accuracy: 0.2276
Epoch 26/50
48/48 [==============================] - 3s 55ms/step - loss: 2.2273e-05 - accuracy: 0.7048 - val_loss: 0.8127 - val_accuracy: 0.2345
Epoch 27/50
48/48 [==============================] - 3s 55ms/step - loss: 2.0299e-05 - accuracy: 0.7068 - val_loss: 0.8209 - val_accuracy: 0.2345
Epoch 28/50
48/48 [==============================] - 3s 55ms/step - loss: 1.8594e-05 - accuracy: 0.7068 - val_loss: 0.8266 - val_accuracy: 0.2345
Epoch 29/50
48/48 [==============================] - 3s 55ms/step - loss: 1.7104e-05 - accuracy: 0.7048 - val_loss: 0.8333 - val_accuracy: 0.2345
Epoch 30/50
48/48 [==============================] - 3s 55ms/step - loss: 1.5790e-05 - accuracy: 0.7035 - val_loss: 0.8400 - val_accuracy: 0.2345
Epoch 31/50
48/48 [==============================] - 3s 55ms/step - loss: 1.4605e-05 - accuracy: 0.7055 - val_loss: 0.8466 - val_accuracy: 0.2345
Epoch 32/50
48/48 [==============================] - 3s 55ms/step - loss: 1.3562e-05 - accuracy: 0.7035 - val_loss: 0.8525 - val_accuracy: 0.2345
Epoch 33/50
48/48 [==============================] - 3s 55ms/step - loss: 1.2621e-05 - accuracy: 0.7048 - val_loss: 0.8590 - val_accuracy: 0.2345
Epoch 34/50
48/48 [==============================] - 3s 55ms/step - loss: 1.1787e-05 - accuracy: 0.7048 - val_loss: 0.8655 - val_accuracy: 0.2345
Epoch 35/50
48/48 [==============================] - 3s 55ms/step - loss: 1.1014e-05 - accuracy: 0.7041 - val_loss: 0.8691 - val_accuracy: 0.2345
Epoch 36/50
48/48 [==============================] - 3s 55ms/step - loss: 1.0323e-05 - accuracy: 0.7048 - val_loss: 0.8754 - val_accuracy: 0.2345
Epoch 37/50
48/48 [==============================] - 3s 55ms/step - loss: 9.7067e-06 - accuracy: 0.7041 - val_loss: 0.8817 - val_accuracy: 0.2345
Epoch 38/50
48/48 [==============================] - 3s 55ms/step - loss: 9.1134e-06 - accuracy: 0.7022 - val_loss: 0.8855 - val_accuracy: 0.2345
Epoch 39/50
48/48 [==============================] - 3s 55ms/step - loss: 8.6085e-06 - accuracy: 0.7035 - val_loss: 0.8916 - val_accuracy: 0.2345
Epoch 40/50
48/48 [==============================] - 3s 55ms/step - loss: 8.1046e-06 - accuracy: 0.7035 - val_loss: 0.8947 - val_accuracy: 0.2345
Epoch 41/50
48/48 [==============================] - 3s 55ms/step - loss: 7.6592e-06 - accuracy: 0.7022 - val_loss: 0.8995 - val_accuracy: 0.2345
Epoch 42/50
48/48 [==============================] - 3s 55ms/step - loss: 7.2462e-06 - accuracy: 0.7022 - val_loss: 0.9043 - val_accuracy: 0.2345
Epoch 43/50
48/48 [==============================] - 3s 55ms/step - loss: 6.8657e-06 - accuracy: 0.7022 - val_loss: 0.9090 - val_accuracy: 0.2345
Epoch 44/50
48/48 [==============================] - 3s 55ms/step - loss: 6.5026e-06 - accuracy: 0.7028 - val_loss: 0.9130 - val_accuracy: 0.2345
Epoch 45/50
48/48 [==============================] - 3s 55ms/step - loss: 6.1753e-06 - accuracy: 0.7015 - val_loss: 0.9175 - val_accuracy: 0.2345
Epoch 46/50
48/48 [==============================] - 3s 55ms/step - loss: 5.8646e-06 - accuracy: 0.7022 - val_loss: 0.9215 - val_accuracy: 0.2276
Epoch 47/50
48/48 [==============================] - 3s 55ms/step - loss: 5.5736e-06 - accuracy: 0.7022 - val_loss: 0.9253 - val_accuracy: 0.2345
Epoch 48/50
48/48 [==============================] - 3s 55ms/step - loss: 5.3081e-06 - accuracy: 0.7022 - val_loss: 0.9301 - val_accuracy: 0.2345
Epoch 49/50
48/48 [==============================] - 3s 56ms/step - loss: 5.0526e-06 - accuracy: 0.7015 - val_loss: 0.9336 - val_accuracy: 0.2276
Epoch 50/50
48/48 [==============================] - 3s 55ms/step - loss: 4.8175e-06 - accuracy: 0.7015 - val_loss: 0.9371 - val_accuracy: 0.2276
In [14]:
# Plot accuracy and loss graphs
def plot_training_history(history):
    # Extract accuracy and loss values
    acc = history.history['accuracy']
    val_acc = history.history['val_accuracy']
    loss = history.history['loss']
    val_loss = history.history['val_loss']
    epochs = range(1, len(acc) + 1)

    # Plot accuracy
    plt.figure(figsize=(12, 5))
    plt.subplot(1, 2, 1)
    plt.plot(epochs, acc, 'b', label='Training Accuracy')
    plt.plot(epochs, val_acc, 'r', label='Validation Accuracy')
    plt.title('Training and Validation Accuracy')
    plt.xlabel('Epochs')
    plt.ylabel('Accuracy')
    plt.legend()

    # Plot loss
    plt.subplot(1, 2, 2)
    plt.plot(epochs, loss, 'b', label='Training Loss')
    plt.plot(epochs, val_loss, 'r', label='Validation Loss')
    plt.title('Training and Validation Loss')
    plt.xlabel('Epochs')
    plt.ylabel('Loss')
    plt.legend()

    plt.tight_layout()
    plt.show()

# Call the function with the history object
plot_training_history(history)
No description has been provided for this image

The results were as expected: not good. The model showed very low accuracy and a high loss on the validation set.

Transfer Learning¶

Now I am going to create a model using transfer learning.

In [15]:
base_model = MobileNetV2(input_shape=(224, 224, 3), include_top=False, weights='imagenet')
base_model.trainable = False  # Freeze weights

model = models.Sequential([
    base_model,
    layers.GlobalAveragePooling2D(),
    layers.Dense(128, activation='relu'),
    layers.Dropout(0.5),
    # layers.Dense(64, activation='relu'),
    # layers.Dropout(0.5),
    # layers.Dense(32, activation='relu'),
    # layers.Dropout(0.5),
    layers.Dense(num_classes, activation='sigmoid')

])
model.compile(optimizer='adam',
              loss='binary_crossentropy',
              metrics=['accuracy'])
# Fit the model with preprocessed labels
callbacks = [
    tf.keras.callbacks.EarlyStopping(patience=10, restore_best_weights=True),
    tf.keras.callbacks.ReduceLROnPlateau(factor=0.2, patience=5)
]
history = model.fit(X_train, y_train_encoded, epochs=50, batch_size=32, validation_data=(X_val, y_val_encoded), callbacks=callbacks)
# Plot accuracy and loss graphs
plot_training_history(history)
Epoch 1/50
48/48 [==============================] - 3s 43ms/step - loss: 0.3770 - accuracy: 0.1039 - val_loss: 0.2073 - val_accuracy: 0.3862 - lr: 0.0010
Epoch 2/50
48/48 [==============================] - 2s 35ms/step - loss: 0.2401 - accuracy: 0.2505 - val_loss: 0.1765 - val_accuracy: 0.3793 - lr: 0.0010
Epoch 3/50
48/48 [==============================] - 2s 35ms/step - loss: 0.1965 - accuracy: 0.3754 - val_loss: 0.1541 - val_accuracy: 0.4897 - lr: 0.0010
Epoch 4/50
48/48 [==============================] - 2s 35ms/step - loss: 0.1700 - accuracy: 0.4208 - val_loss: 0.1404 - val_accuracy: 0.5310 - lr: 0.0010
Epoch 5/50
48/48 [==============================] - 2s 35ms/step - loss: 0.1525 - accuracy: 0.4845 - val_loss: 0.1290 - val_accuracy: 0.5586 - lr: 0.0010
Epoch 6/50
48/48 [==============================] - 2s 35ms/step - loss: 0.1414 - accuracy: 0.5253 - val_loss: 0.1290 - val_accuracy: 0.5655 - lr: 0.0010
Epoch 7/50
48/48 [==============================] - 2s 35ms/step - loss: 0.1294 - accuracy: 0.5286 - val_loss: 0.1215 - val_accuracy: 0.5586 - lr: 0.0010
Epoch 8/50
48/48 [==============================] - 2s 35ms/step - loss: 0.1210 - accuracy: 0.5556 - val_loss: 0.1161 - val_accuracy: 0.5862 - lr: 0.0010
Epoch 9/50
48/48 [==============================] - 2s 35ms/step - loss: 0.1128 - accuracy: 0.5687 - val_loss: 0.1137 - val_accuracy: 0.6207 - lr: 0.0010
Epoch 10/50
48/48 [==============================] - 2s 35ms/step - loss: 0.1093 - accuracy: 0.5786 - val_loss: 0.1111 - val_accuracy: 0.6276 - lr: 0.0010
Epoch 11/50
48/48 [==============================] - 2s 35ms/step - loss: 0.1034 - accuracy: 0.5838 - val_loss: 0.1104 - val_accuracy: 0.6138 - lr: 0.0010
Epoch 12/50
48/48 [==============================] - 2s 35ms/step - loss: 0.1005 - accuracy: 0.5989 - val_loss: 0.1062 - val_accuracy: 0.6414 - lr: 0.0010
Epoch 13/50
48/48 [==============================] - 2s 35ms/step - loss: 0.0976 - accuracy: 0.6108 - val_loss: 0.1026 - val_accuracy: 0.6138 - lr: 0.0010
Epoch 14/50
48/48 [==============================] - 2s 34ms/step - loss: 0.0915 - accuracy: 0.6062 - val_loss: 0.1030 - val_accuracy: 0.6345 - lr: 0.0010
Epoch 15/50
48/48 [==============================] - 2s 34ms/step - loss: 0.0883 - accuracy: 0.6239 - val_loss: 0.1033 - val_accuracy: 0.6345 - lr: 0.0010
Epoch 16/50
48/48 [==============================] - 2s 35ms/step - loss: 0.0855 - accuracy: 0.6358 - val_loss: 0.1012 - val_accuracy: 0.6414 - lr: 0.0010
Epoch 17/50
48/48 [==============================] - 2s 35ms/step - loss: 0.0817 - accuracy: 0.6292 - val_loss: 0.1041 - val_accuracy: 0.6483 - lr: 0.0010
Epoch 18/50
48/48 [==============================] - 2s 35ms/step - loss: 0.0812 - accuracy: 0.6430 - val_loss: 0.1034 - val_accuracy: 0.6138 - lr: 0.0010
Epoch 19/50
48/48 [==============================] - 2s 35ms/step - loss: 0.0779 - accuracy: 0.6384 - val_loss: 0.1075 - val_accuracy: 0.6138 - lr: 0.0010
Epoch 20/50
48/48 [==============================] - 2s 35ms/step - loss: 0.0793 - accuracy: 0.6266 - val_loss: 0.1046 - val_accuracy: 0.6138 - lr: 0.0010
Epoch 21/50
48/48 [==============================] - 2s 35ms/step - loss: 0.0769 - accuracy: 0.6417 - val_loss: 0.1009 - val_accuracy: 0.6207 - lr: 0.0010
Epoch 22/50
48/48 [==============================] - 2s 35ms/step - loss: 0.0744 - accuracy: 0.6417 - val_loss: 0.1065 - val_accuracy: 0.6345 - lr: 0.0010
Epoch 23/50
48/48 [==============================] - 2s 35ms/step - loss: 0.0727 - accuracy: 0.6594 - val_loss: 0.1016 - val_accuracy: 0.6552 - lr: 0.0010
Epoch 24/50
48/48 [==============================] - 2s 35ms/step - loss: 0.0705 - accuracy: 0.6535 - val_loss: 0.1023 - val_accuracy: 0.6345 - lr: 0.0010
Epoch 25/50
48/48 [==============================] - 2s 34ms/step - loss: 0.0690 - accuracy: 0.6627 - val_loss: 0.1044 - val_accuracy: 0.6483 - lr: 0.0010
Epoch 26/50
48/48 [==============================] - 2s 35ms/step - loss: 0.0691 - accuracy: 0.6489 - val_loss: 0.1096 - val_accuracy: 0.6552 - lr: 0.0010
Epoch 27/50
48/48 [==============================] - 2s 35ms/step - loss: 0.0664 - accuracy: 0.6654 - val_loss: 0.1089 - val_accuracy: 0.6621 - lr: 2.0000e-04
Epoch 28/50
48/48 [==============================] - 2s 34ms/step - loss: 0.0633 - accuracy: 0.6667 - val_loss: 0.1064 - val_accuracy: 0.6552 - lr: 2.0000e-04
Epoch 29/50
48/48 [==============================] - 2s 35ms/step - loss: 0.0640 - accuracy: 0.6752 - val_loss: 0.1059 - val_accuracy: 0.6621 - lr: 2.0000e-04
Epoch 30/50
48/48 [==============================] - 2s 35ms/step - loss: 0.0634 - accuracy: 0.6746 - val_loss: 0.1060 - val_accuracy: 0.6621 - lr: 2.0000e-04
Epoch 31/50
48/48 [==============================] - 2s 35ms/step - loss: 0.0627 - accuracy: 0.6634 - val_loss: 0.1047 - val_accuracy: 0.6552 - lr: 2.0000e-04
No description has been provided for this image

The plot here looks a lot better than the last plot, with the training and validation closer to each other. However, I haven't gotten around to evaluating it properly yet.

In [16]:
from sklearn.metrics import classification_report

# Evaluate the model on the test set
test_loss, test_accuracy = model.evaluate(X_test, y_test_encoded, verbose=1)
print(f"Test Loss: {test_loss:.4f}")
print(f"Test Accuracy: {test_accuracy:.4f}")

# For multi-label, also calculate F1-score, precision, recall

# Predict probabilities and binarize with threshold 0.5
y_pred_probs = model.predict(X_test)
y_pred = (y_pred_probs > 0.5).astype(int)

# Get label names in order
label_names = list(label_counts.keys())

print(classification_report(y_test_encoded, y_pred, target_names=label_names, zero_division=0))
3/3 [==============================] - 0s 23ms/step - loss: 0.1099 - accuracy: 0.6712
Test Loss: 0.1099
Test Accuracy: 0.6712
3/3 [==============================] - 0s 17ms/step
                   precision    recall  f1-score   support

         crackers       0.80      1.00      0.89         4
         sausages       0.71      0.91      0.80        11
           sprite       0.50      0.50      0.50         2
  chocolate_drink       0.77      1.00      0.87        10
             coke       0.50      0.75      0.60         4
           orange       1.00      0.80      0.89        10
            apple       0.80      0.50      0.62         8
          paprika       0.73      0.89      0.80         9
          noodles       0.75      0.50      0.60         6
           cereal       0.78      0.78      0.78         9
      grape_juice       0.91      1.00      0.95        10
           basket       0.67      0.33      0.44         6
     orange_juice       0.90      0.82      0.86        11
          scrubby       0.62      0.50      0.56        10
       sponge_opl       0.88      1.00      0.93         7
        cloth_opl       0.00      0.00      0.00         5
     potato_chips       0.88      0.88      0.88         8
         pringles       0.91      1.00      0.95        10
           potato       1.00      0.50      0.67         8
            onion       0.00      0.00      0.00         5
           garlic       0.00      0.00      0.00         7
           butter       1.00      0.62      0.77         8
             eggs       0.67      0.67      0.67         3
           tomato       1.00      0.67      0.80         9
            lemon       1.00      0.80      0.89         5
             tray       0.83      0.71      0.77         7
help_me_carry_opl       1.00      0.50      0.67         2

        micro avg       0.82      0.70      0.75       194
        macro avg       0.73      0.65      0.67       194
     weighted avg       0.76      0.70      0.71       194
      samples avg       0.66      0.63      0.63       194

the accuracy is at around 60 procent which is not very high.

In [17]:
def plot_images_with_bboxes_and_predictions(images, label_strings, predictions, label_names, title, num_images=10, max_columns=2):
    indices = np.random.choice(len(images), size=min(num_images, len(images)), replace=False)
    num_rows = (len(indices) + max_columns - 1) // max_columns
    plt.figure(figsize=(12, 5 * num_rows))

    for i, idx in enumerate(indices):
        img = images[idx].copy()
        # Draw true bounding boxes (green)
        if label_strings[idx]:
            for line in label_strings[idx].splitlines():
                parts = line.split()
                if len(parts) >= 9:
                    coords = list(map(int, parts[:8]))
                    label = parts[8]
                    pts = np.array(coords).reshape((4, 2))
                    cv2.polylines(img, [pts], isClosed=True, color=(0, 255, 0), thickness=2)
                    cv2.putText(img, label, tuple(pts[0]), cv2.FONT_HERSHEY_SIMPLEX, 0.6, (0, 255, 0), 2)

        # Draw predicted class names (blue) at the top-left corner
        pred_label_indices = np.where(predictions[idx] == 1)[0]
        pred_labels = [label_names[j] for j in pred_label_indices]
        for j, pred_label in enumerate(pred_labels):
            cv2.putText(img, pred_label, (10, 25 + 25 * j), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0, 0, 255), 2)

        # Prepare true label list for title
        true_label_indices = np.where(y_test_encoded[idx] == 1)[0]
        true_labels = [label_names[j] for j in true_label_indices]

        plt.subplot(num_rows, max_columns, i + 1)
        plt.imshow(img)
        plt.axis("off")
        plt.title(f"True: {', '.join(true_labels)}\nPred: {', '.join(pred_labels)}", fontsize=10)

    plt.suptitle(title)
    plt.tight_layout(rect=[0, 0, 1, 0.95])
    plt.show()

# Show test images with bounding boxes and predictions
plot_images_with_bboxes_and_predictions(X_test, resized_test_labels, y_pred, label_names, "Test Images: Bounding Boxes & Predictions")
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Got range [-0.010279009..255.0].
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Got range [-0.02314975..255.0].
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Got range [-0.0030080972..255.0].
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Got range [-0.044627793..255.0].
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Got range [-0.017967775..255.0].
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Got range [0.0..255.0].
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Got range [-0.013181047..255.0].
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Got range [-0.014539398..255.0].
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Got range [-0.0031992933..255.0].
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Got range [-0.009039779..255.0].
No description has been provided for this image

As visible above, the model predicts what is in the picture but not where, which makes it unclear if it actually detects the object or not. For this, I'll be looking into YOLO next, since that is more suitable for object detection.

Because I have never worked with YOLO before, and when attempting it in this notebook I made a complete mess of it because I needed to restructure my data, I decided to move the entire section to another notebook so I don't have to worry about rerunning everything accidentally.